153 research outputs found

    Localization of adaptive variants in human genomes using averaged one-dependence estimation.

    Get PDF
    Statistical methods for identifying adaptive mutations from population genetic data face several obstacles: assessing the significance of genomic outliers, integrating correlated measures of selection into one analytic framework, and distinguishing adaptive variants from hitchhiking neutral variants. Here, we introduce SWIF(r), a probabilistic method that detects selective sweeps by learning the distributions of multiple selection statistics under different evolutionary scenarios and calculating the posterior probability of a sweep at each genomic site. SWIF(r) is trained using simulations from a user-specified demographic model and explicitly models the joint distributions of selection statistics, thereby increasing its power to both identify regions undergoing sweeps and localize adaptive mutations. Using array and exome data from 45 ‡Khomani San hunter-gatherers of southern Africa, we identify an enrichment of adaptive signals in genes associated with metabolism and obesity. SWIF(r) provides a transparent probabilistic framework for localizing beneficial mutations that is extensible to a variety of evolutionary scenarios

    A Tale of Two Haplotypes: The \u3cem\u3eEDA2R/AR\u3c/em\u3e Intergenic Region is the most Divergent Genomic Segment between Africans and East Asians in the Human Genome

    Get PDF
    Single nucleotide polymorphisms (SNPs) with large allele frequency differences between human populations are relatively rare. The longest run of SNPs with an allele frequency difference of one between the Yoruba of Nigeria and the Han Chinese is found on the long arm of the X chromosome in the intergenic region separating the EDA2R and AR genes. It has been proposed that the unusual allele frequency distributions of these SNPs are the result of a selective sweep affecting African populations that occurred after the Out-of-Africa migration. To investigate the evolutionary history of the EDA2R/AR intergenic region, we characterized the haplotype structure of 52 of its highly-differentiated SNPs. Using a publicly-available dataset of 3,000 X chromosomes from 65 human populations, we found that nearly all human X chromosomes carry one of two modal haplotypes for these 52 SNPs. The predominance of two highly divergent haplotypes at this locus was confirmed using a subset of individuals sequenced to high coverage. The first of these haplotypes, the α haplotype, is at high frequencies in most of the African populations surveyed and likely arose prior to the separation of African populations into distinct genetic entities. The second, the β haplotype, is frequent or fixed in all non-African populations and likely arose in East Africa prior to the Out-of-Africa migration. We also observed a small group of rare haplotypes with no clear relationship to the α and β haplotypes. These haplotypes occur at relatively high frequencies in African hunter-gatherer populations, like the San and Mbuti Pygmies. Our analysis indicates that these haplotypes are part of a pool of diverse, ancestral haplotypes that have now been almost entirely replaced by the α and β haplotypes. We suggest that the rise of the α and β haplotypes was the result of the demographic forces that human populations experienced during the formation of modern African populations and the Out-of-Africa migration. However, we also present evidence that this region is the target of selection in the form of positive selection on the α and β haplotypes and of purifying selection against α/β recombinants

    Determining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method

    Get PDF
    Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes

    A Panel of Ancestry Informative Markers for the Complex Five-Way Admixed South African Coloured Population

    Get PDF
    Admixture is a well known confounder in genetic association studies. If genome-wide data is not available, as would be the case for candidate gene studies, ancestry informative markers (AIMs) are required in order to adjust for admixture. The predominant population group in the Western Cape, South Africa, is the admixed group known as the South African Coloured (SAC). A small set of AIMs that is optimized to distinguish between the five source populations of this population (African San, African non-San, European, South Asian, and East Asian) will enable researchers to cost-effectively reduce false-positive findings resulting from ignoring admixture in genetic association studies of the population. Using genome-wide data to find SNPs with large allele frequency differences between the source populations of the SAC, as quantified by Rosenberg et. al's -statistic, we developed a panel of AIMs by experimenting with various selection strategies. Subsets of different sizes were evaluated by measuring the correlation between ancestry proportions estimated by each AIM subset with ancestry proportions estimated using genome-wide data. We show that a panel of 96 AIMs can be used to assess ancestry proportions and to adjust for the confounding effect of the complex five-way admixture that occurred in the South African Coloured population.Department of HE and Training approved lis

    Determining ancestry proportions in complex admixture scenarios in South Africa using a novel proxy ancestry selection method

    Get PDF
    Publication of this article was funded by the Stellenbosch University Open Access Fund.The original publication is available at http://www.plosone.org/Admixed populations can make an important contribution to the discovery of disease susceptibility genes if the parental populations exhibit substantial variation in susceptibility. Admixture mapping has been used successfully, but is not designed to cope with populations that have more than two or three ancestral populations. The inference of admixture proportions and local ancestry and the imputation of missing genotypes in admixed populations are crucial in both understanding variation in disease and identifying novel disease loci. These inferences make use of reference populations, and accuracy depends on the choice of ancestral populations. Using an insufficient or inaccurate ancestral panel can result in erroneously inferred ancestry and affect the detection power of GWAS and meta-analysis when using imputation. Current algorithms are inadequate for multi-way admixed populations. To address these challenges we developed PROXYANC, an approach to select the best proxy ancestral populations. From the simulation of a multi-way admixed population we demonstrate the capability and accuracy of PROXYANC and illustrate the importance of the choice of ancestry in both estimating admixture proportions and imputing missing genotypes. We applied this approach to a complex, uniquely admixed South African population. Using genome-wide SNP data from over 764 individuals, we accurately estimate the genetic contributions from the best ancestral populations: isiXhosa (33%±0:226), {Khomani SAN (31%±0:195), European (16%±0:118), Indian (13%±0:094), and Chinese (7%±0:0488). We also demonstrate that the ancestral allele frequency differences correlate with increased linkage disequilibrium in the South African population, which originates from admixture events rather than population bottlenecks.Stellenbosch UniversityMRC Centre for Molecular and Cellular Biology and the DST/NRF Centre of Excellence for Biomedical TB ResearchCarnegie Corporation Grant and by the Department of Clinical Laboratory Sciences, University of Cape TownPublishers' versio

    Cryptic Distant Relatives Are Common in Both Isolated and Cosmopolitan Genetic Samples

    Get PDF
    Although a few hundred single nucleotide polymorphisms (SNPs) suffice to infer close familial relationships, high density genome-wide SNP data make possible the inference of more distant relationships such as 2nd to 9th cousinships. In order to characterize the relationship between genetic similarity and degree of kinship given a timeframe of 100–300 years, we analyzed the sharing of DNA inferred to be identical by descent (IBD) in a subset of individuals from the 23andMe customer database (n = 22,757) and from the Human Genome Diversity Panel (HGDP-CEPH, n = 952). With data from 121 populations, we show that the average amount of DNA shared IBD in most ethnolinguistically-defined populations, for example Native American groups, Finns and Ashkenazi Jews, differs from continentally-defined populations by several orders of magnitude. Via extensive pedigree-based simulations, we determined bounds for predicted degrees of relationship given the amount of genomic IBD sharing in both endogamous and ‘unrelated’ population samples. Using these bounds as a guide, we detected tens of thousands of 2nd to 9th degree cousin pairs within a heterogenous set of 5,000 Europeans. The ubiquity of distant relatives, detected via IBD segments, in both ethnolinguistic populations and in large ‘unrelated’ populations samples has important implications for genetic genealogy, forensics and genotype/phenotype mapping studies

    Genomic Ancestry of North Africans Supports Back-to-Africa Migrations

    Get PDF
    North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from “back-to-Africa” gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa
    corecore